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Abstract 

r/^ ' This is a detailed survey which mainly presents the Pinkham- Feller 

way. I added some new points to the first version )V2] and I suppressed 
"Examples" devoted to Gamma, Frechet and Weibull laws. Theorem^ 
is a bit more general (no assumption of density: this answers a question 
of T. Hill). Section [10] is new and devoted to an argument (Poincare, 
Fewster) about the effect of high frequencies oscillations. Maybe many 

rvj ' works, many efforts, have been devoted to the study of a sufficient 

J> , condition of poor value: see Sections [7] and [TTJ The final Section gives 

some suggestions. 
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1 Introduction 
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o 



F. Benford 1938] (and earlier S. Newcomb [Nj 1881]) observed that, in 
numerical data, when the numbers are written in base 10, very often the first 
digit, which is an integer between 1 and 9, takes the value 1 with a frequency 
much greater than 1/9 since close td_| log 2 = 0.3010... More generally the 
Benford phenomenon would be that the first digit in base 10, let us denote 
it by D, follows the "law": 

P(£> = fc)=log(^) (*€{1,...,9}). (1) 

9 , .. 

(Note that ^log( — - — ) = log(10) - log(l) = 1.) 
fc=l 



* Mathematiques, Universite Montpellier II, place Eugene Bataillon, Case courier 051, 
34095 Monptellier Cedex, France, email: mivaladier@wanadoo.fr 

1 We denote by log the logarithm in base 10. The logarithm in base e will be denoted 
by In. 



Comments. The Benford phenomenon is not so intuitive. On the contrary: 
for example integers with 4 digits go from 1000 to 9999. "Random" would 
give the probability 1/9000 for each and the probability 1/9 for {D = 1}. 
For more see Raimi [Rl[ IR2] and Janvresse [J] . Not any data can satisfy the 
Benford phenomenon. As said by Scott et Fasli |SF| (just before Section 3) 
the height of men will give, essentially D = 1 if expressed in meters, and, if 
expressed in feecl, D will take mainly the values 4, 5 and 6. An academic 
example: if the law of X is uniform on [1,2], D = 1 almost surely. For 
negative examples see Section! 



This paper is devoted to the first digit of a random variable and not 
to the other digits or to dynamical systems. And I will not discuss papers 
relying on Fourier Analysis such as Pinkham [Pi} 1961], Good [Go! 1986], 
Boyle |Bo[ 1994] (note also that Fourier arguments are several times used 
in |BH2]). The litterature is tremendous: Hiirlimann |Huj gives till 2006, 
around 350 references; and Berger and Hill [BH3] quote around 600 papers 
(see also [Bee]). For French vulgarization papers see [Hi4|, iDl |J] . Maybe I 
missed some important results. 

I will discuss some arguments starting from Feller [Fel( 1966] and quote 
specially Pinkham [Pi] 1961], Engel-Leuenberger [EH 2003], Diimbgen-Leu- 
enberger (DLj 2008], Gauvrit-Delahaye [GTJ11IGD21IGD31 2008-2009], Berger 
[Erl 2010]. 

A mathematical argument going back to Pinkham JPTJ, 1961] is: if the 
density g of log X is well spread then Benford is approximately satisfieco 
In his book Feller [Fell pp. 62-63] resumed quickly this result (and quotes 
Poincare's roulette) with an elementary proof which contains a flaw (see 
below Sectional). The Feller hypothesis is: g is unimodal and the small- 
ness of the maximum ensures the spreadness. Exactly the same way can be 
made correct see |GDl|,lG"D2] (explained in detail in Section [5] below). More- 
over Diimbgen and Leuenberger [DLj proved far more better bounds relying 
firstly on total variation of g and further on derivatives of g. I explained the 
bounds relying on total variation in Section [6l 

Some "disaster" appears: see Section [7J Indeed for so usual families of 
laws on M+ as the exponential law (density f(x) = Ae ) or the uniform 
law (density (6 — a) _1 l[ a) (,]) the increasing spreadness of / when A — >• 0, 
resp. b —> +oo, is not transmitted to g. See Section [8.11 for examples of the 
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The foot equals 0.3048 meter. 



3 It should be noted that the first part of [PI] — about scale invariance — is not correct 
(see |BH2I Section 4.2 p. 40]) and that the second part, which is what we quote to, relies 
on Fourier. Scale invariance was correctly studied by Hill [Hill 1995], |BH2t Th.4.20]. 



effect of multiplication_| by x, [Brj for a carefull discussion of the uniform 
law (resumed in [BH2[ Prop. 4. 15 p. 37]), [ELJ for the exponential law, and 
[BH1| for a critical review of several arguments (the word "fallacious" used 
in this paper, also in [BH2, p. 39 before Th.4.17] seems having less pejorative 
meaning in English than "fallacieux" has in French). 

Section [9] gives naive results. Section [10] analyses some arguments of 
Fewster's paper [Few] going back to Poincare [Pol 1912]. Section [TT1 suggests 
some conclusions. 

2 Preliminaries 

Let X be a random variable (briefly r.v.) with values in R?j_ = ]0, +oo[. Let 
us denote by D(uj) the first digit in base 10 of X(oS). It belongs to {1, ..., 9}. 
Let n£Z and k G {1, ..., 9}; when X belongs to the interval [l0 n , 10 n+1 [ , 

D = k\s equivalent to X G [k 10 n , (Jfe + 1)10™ [ . 

We abbreviate {uj ; D(uj) = k} in {D = k}. The following covering is a 
partition (pairwise disjoint subsets) 

{D = k}= \J{xe[k io n , {k + i)io™[} . 

neZ 

The following by Block and Savits [BS] is certainly the most convincing 
early qualitative argument in the direction of Benford: if the density / is 
(strictly) decreasing^) on R+ = ]0, +oo[ then 

P(D = 1)>P(D = 2)>->P(D = 9). (2) 

This comes from the formula 



P(D = k) = / f(x)dx. 

Note that the gaps in ([2]) can be very small: take / affine on [0, 1] with a 
small slope (for example /(0) = 1+e, /(l) = 1 — e and f(x) = elsewherqj). 



4 As shown by (1111) below, one passes from f to g multiplying essentially by 10 H = x. 
The maximum, resp. the total variation of g equals the maximum (resp. total variation) 
of x h^ In(10)a;/(a;). 

5 Such a density has its greatest values near 0. The density , ^'^Jqi where p < q, 
p,q G Z, is null on a neighborhood of but obeys exactly to Benford. For more on this, 
see Remark 1 after Theorem Q] 

6 A strictly decreasing C°° descent to when x varies from 1 — 77 to +00 with > 
values is possible without altering seriously this example. 



Plenty of arguments are expressed with the r.v. Y := log(X), which 
takes its values in R. With this r.v. the following partition holds 

{D = k}= \J {Y G [log(fc) + n, log(k + 1) + n[} . (3) 

neZ 

Let M.{y) denotes the mantissa of the real number y defined by: 

if n £ Z and y £ [n, n + 1[ , M.(y) := y — n . 

(The integer n above is the integral part of y usually denoted [y\ and Ai(y) 
is also called fractional part of y.) Thus D = k is equivalent to (cf. ©) 

M(Y)e [log k,log{k + l)[. (4) 

Assume Y has the density g. Then J14(Y) has the density (this is already 
in (EH p. 1224], [Fel, (8.3) p. 62], [EH (8.8) p. 531]), 

[0,1] 3y^^g{n + y)=:g{y), (5) 

nGZ 

(the point 1 should not be in the domain but later for expressing the total 
variation of g it will be useful). And let us denote by G the cumulative 
distribution function of M(Y): 



Vze [0,1], G{z) = / g(u)du. 
Jo 

We say that g is unimodal if g is non-decreasing till some abscissa, and then 
non-increasing. 

The end of this Section is not necessary to understand the remaining 
of the paper, but it corrects the impression caused by the seemingly non- 
smooth definition of the mantissa. Classically the torus T = M/Z is identified 
to [0, 1[. With this identification the canonical surjection ip : R — > T coin- 
cides with the mantissa M. A geometrical view is: use as for T the unit 
circle U via the identification 

T~ [0,l[3me M €U 

and as for R the helicoid EI via the identification 

R9M (e 2nit ,t) GicUx R. 

Then ip becomes the very smooth map 

M3 (z,y)^ zeV. 



3 Position of the problem 

We will rarely use the following: 

V*€[0,1], G[z)=z (6) 

(equivalently g(z) = 1 a.e.) in which case one commonly says "X satisfies the 
Benford law" . More correct would be: X modulo 10 (in the multiplicative 
group M.* + ) obeys to the Benford law. 

Benford's phenomenon for the first digit is exactly satisfied if 

Vfce{2,...,9}, G(logfc) =\ogk. 

As for approximation one can ask for 

Vfce{2,...,9}, G (log k) - log k is small 

or "g is close to the constant function l[o,i]"- 

When we will have found a sufficient condition expressed with g (that is 
with Y) the difficulty will be, after expressing it with X = 10 , to inventory 
which laws satisfy the sufficient condition. See Sections [7] and [11] where we 
will question about the pertinence of this sufficient condition. 

4 Poincare's roulette problem (from Feller) 

This title comes from Feller [Fell Section 8 (b) p. 62]3. If a ball is launched 
from a given zero point on a circlqj of circumference 1, if the length path is 
Y, the final position of the ball will be M(Y). 

When is the law of M.{Y) close to the uniform distribution? Intuitively if 
one throws the ball with sufficient force and no special effort to get an integer 
number of revolutions or some other precise result, the uniform distribution 
will be approached. 

Feller [Felj, p. 62] says that a valid assumption is "<? is sufficiently spread" 
(implying a small maximum). This is a bit fuzzy. As soon after Feller gives 
a precise and relevant hypothesis: the density g of Y = logX is unimodal 
and has a small maximum. (Below we will quote Pinkham [Pij 1961] who 



7 Note that Feller says at the end of his Section (b) that Poisson's formula could be 
used. And when he turns to Benford in Section (c) he quotes the name (not any paper) 
of Pinkham. 

8 Poincare |Po) speaks of the roulette pages 11-13, pages 148-150, and of digits pages 
313-320. Two figures in [Few] look a bit like the figure in |Po| page 149. See our Section flOl 



worked with a better hypothesis.) In my opinion there is a flaw in |Fel| 
about which I could not find any precise reference in the litterature. It is 
the following: the point Xk defined just after (8.4), which is nothing else but 
A4(x — a) + a + k, is not necessarily on the left of [a + k, a + k + 1[, so the 
assertion, just below (8.5), "For k < the integrand is < 0" is not correct. 
Nevertheless Raimi |R1} p. 533] quotes Feller. 

Under the unimodality hypothesis Gauvrit and Delahaye in 2008 [GDI J 
gave a correct proof of 

toe [0,1], \G(z)-z\ < 2maxg. (7) 

Their proof is in the line of Feller but they they do not quote him; in [GD2J 
they spoke of "scatter and regularity" which are surely not the good words. 
Dumbgen-Leuenberger in 2008 [DL| Th.l and Cor. 2] still starting from the 
same "spreadness idea" (they assume that the total variation of g is small) 
give far more better bounds: see Section [6] below. Already in 1961 Pinkham 
[PT1 Corollary p. 1229] (quoted by Raimi [Rl, (8.12) p. 533]) using Fourier 
Analysis arguments obtained 

sup \G(z)-z\ <TV(s)/6. 

0<z<l 

5 The proof of Gauvrit-Delahaye 

Despite the existence of [PU 1961] and [DL|, 2008] I reproduce the proof by 
Gauvrit and Delahaye because it is elementary and pleasant. We assume 
the density g of Y = log X is unimodal and we denote by M its maximum 
over R (maximum to be small). 

Proof of (J7J). The density g is non-decreasing on ]— oo, b] and non- increasing 
on [b, +oo[. Let M = g(b). Without loss of generality we can translate g 
by an integer n G Z, so we may supposci b G [0, 1]. Let z G ]0, 1]. We will 
prove the two following inequalities: 

G(z) < z + 2 M and G{z) >z-2M. 

The idea (not far from the idea in Feller's book) is that on left of b the mean 
of g on [n, n + z] is less than the mean 10 ! of g on [n, n + 1] and that on right 
of b the mean of g on [n, n + z] is less than that of g on [n + z — 1, n + z\. 



9 This is the argument in [GD1| . Surely b — is possible (here we have forgotten D 
and the factor 10 relative to X) and maybe ([7| could be improved of a factor 2. 

10 To prove - J* g(y)dy < f g{y)dy when g is non-decreasing on [0,1], express 
Jo d(v) dy as an integral over [0, l] by a linear change of variable. 



Precisely: for any n < — 1, since g is non-decreasing on ]— oo,0], one has 



i rn+z rn+1 

- / g(y) dy< g(y) 

z J n J n 



dy 



hence 

rn+z r 



i rn+z ru 

- Yl / 9(y)dy< / g(y)dy. (8) 



Z neZ,n<-l Jn -'- x 



Similarly for any n > 2 thanks to the non-increasingness of g on [1 + z, +oo[, 

i rn+z rn+z 

- / g(y) dy < g{y) dy 

z Jn Jn+z~\ 

hence 

1 rn+z r+oo 

- Y, / 9(y)dy< / g{y)dy. (9) 



neZ,n>2 

Summing and ([9]) gives 



-i rn+z r+oo 

Yl 9{v) dy < g(y) dy = 1. 

„r-^ „Jn „J1 •'W J —OO 



n£Z,n^0,n^l Jn 

On the left hand-side are lacking terms corresponding to n = and n = 1. 
Each of them is bounded by 

^ rn+z 

- / g(y) dy < M 

Z .In 



hence 

and 

Now we turn to 



-G(z) < 1 + 2M 



G(z) <z + 2M . 



G{z) >z-2M. 



On left of b the mean of g on [n, n + z] is greater than the mean of g on 
[n + z — l,n + z]. And on right of b the mean of g on [n, n + z] is greater 
than the mean of g on [n, n + 1]. Thus for n < — 1, 

-I rn+z rn+z 

g{y) dy> g(y) dy 

Jn+z—l 



and for n > 1, 

and summing 



nGZ, n^O 



i rn+z rn+Y 

- I g{y) dy> g(y) dy 

% J n. .Irt 



n+z r — l-\-z f-\-oo 

g(y) dy> g(y) dy + g(y) dy 

n J —oo J\ 

■ 1 

g(y)dy- I g(y)dy. 

l+z 



As the interval [—1 + z, 1] has length < 2, the last term has absolute value 
< 2M. □ 

6 Bounds expressed with total variation (Diimbgen- 
Leuenberger) 

We will expose essentially some results by Diimbgen-Leuenberger in 2008 
[DH Th.l and Cor. 2]. Finite total variation encompasses unimodality. Pre- 
cisely if g is unimodal, its total variation is 2 m axq . With total variation 
several local minima and maxima are manageable 11 !. 

Recall that g is the density of Y on R. By the "stacking" operation, the 
density of AiiY) on [0,1] is g(z) defined in ([5]). A classical notion is total 
variation. We assume that g has a finite total variation which we define b\l 12 l 

m 

TV(fif) := sup j^ \g(yi) - g(yi-i)\ ; m > 1, -00 < y < ■ ■ ■ < y m < +oo| . 
i=i 

If g is unimodal, g(y) —> when \y\ — > +00, and TV (g) = 2 maxi^g. 

As for the total variation of g which is a function on the torus T identified 
to the half open interval [0, 1[, one should consider 

m 

SU P\52\9( z i)-g( z i~l)\ + \g( z rn)-g( z o)\ ; m > I) ° < z < ■■■ < z m < 1} • 
i=l 



11 In GDI, GD2 the authors say that a finite number of bumps is possible. The proof 
could be tedious. Example 18.2.11 below shows that an infinite sequence of bumps may be 
bad. 

12 Usually this formula is written with strict inequalities. It would give the same result 
(repetition of a value is useless). For a fine study of finite total variation functions in one 
variable, but for vector valued functions, see [M] . The total variation could be overesti- 
mated if one used "erratic values" of g. A non-erratic value at y is a value between the 
two lateral limits which do exist, see for example [Ml Prop. 4.2 p. 11]. Note that variation 
is better adapted to cumulative functions than to densities! 



But considering g as denned on [0, 1] withl 13 ! g(l) =5(0) one can write 

m 

TV(fl) = sup{^ \g(zi) - 5(z(_i)| ; m > 1, < z < • • • < z m < l} . 
Now we observe that 

N 
g(z) = limg N (z) where g N (z) = ^J 3(71 + 2). 

n=-iV 

Then for any sequence < zo < • • • < z m < 1 , 

m m N 

^2\9n(zi) - gN{zi-i)\ < ^2 ^2 \a{ n + z i) - 9i n + z i-i)\ 

i=X i=l n=-N 

N m 

= ^2 J }Z\9( n + z i) -ai n + z i-i)\ 

n=-N i=l 

< TV(g) 

hence the inequality (cf. the first assertion of [PL} Theorem 1] and [DH 
formula (5) page 107]), 

TV® < TV(g) . 

Since 



sup g > / g{z) dz = 1 > inf g 
Jo 



one has 

R(g) := sup \g(z 2 ) - g{zi)\ > sup\g(z) - 1| 

Zl<Z2 z 

Note that \g{z 2 ) - g{z\)\ = max([g(z 2 ) - g(zi)} + , [g(z 2 ) - g{zi)}~). Since g 
is integrable on R it tends to at infinity, and with the notation 

m 

TV(gf) + := sup|^(5f(yi)-3(yi_i)) + ; m > 1, -00 < y < ■ ■ ■ < y m < +00 j . 

and the anologous with negative parts, one has TV + (g) = TV~(g) = 
TV(g)/2. Hence (cf. [DL, Th.l]) 

sup \g(z)-l\<R(g)<TY(g)/2. 
ze[0,i] 



13 Here we can see the importance of a sharp definition of g: for example if g(y) = 2y 
on [0, 1] and elsewhere, g(z) — 2 z on ]0, 1[, but the downfall of 2 has to be added to the 
progressive increase of 2 to get the true value TV (<?) = 4. 



Now we are going to prove |DI4 Cor. 2 p. 102] 



sup \(G(z 2 ) - G{z x ) - (z 2 - zi)\ < (z 2 - zi)[l - (z 2 - zx)} TV(g)/2. 

0<21<Z 2 <1 



Let S := z 2 — Z\. Then (we reproduce |DH proof of Cor. 2 p. 108]) 

\(G(z 2 )-G(z 1 



(z 2 - Z\) 



22 T22 

g(z) dz — 5 g(z) dz 

Z\ J 22 — 1 

(1 - 5) r g(z) dz - 5 f 1 -g(z) dz 

J Z\ J 22 — 1 

5(1 - 5) [ [g( Zl + St) - g{zx - (1 - 5)t)] dt 
Jo 

< 6(1 - 8) [ \g( Zl + St) - g( Zl - (1 - S)t)\ dt 
Jo 

<S(l-S)R(g)/2 
<S(l-S)TV(g)/2 



which implies 



sup \(G(z 2 ) - G(zx) - (z 2 - zx)\ < TY(g)/8. 

0<21<22<1 



(10) 



As already said, in 1961 Pinkham jPll bottom of page 1228] using Fourier 
Analysis arguments obtained 

sup \G(z)-z\ <TV(g)/6. 

0<2<1 

All these results give better bounds than those of the foregoing Section. 
Indeed, if g is unimodal, gives 

\(G(z 2 ) - G(zx) - (z 2 - z x )\ < 4 max 5 = 2 TV(<?) . 

In their paper [DLj Diimbgen-Leuenberger give other fine bounds when g 
admits derivatives. 



7 Return to X, the disaster 

Now, what becomes an hypothesis concerning g when expressed in term of 
X or its density /? A disaster appears: spreadness of / is not equivalent to 
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spreadness of g. The two reciprocal biiectiona 14 ! 

R9y^lO y GR^ and l^i^bgiel 

exchange perfectly the couple X and Y and also the couple of cumulative 
disstribution functions Fx and Fy' one can switch between one and the 
other only by changing x in 1(F or y in logx. But as for the density one has 

0(l/)=ln(lO)lO»/(lO») and f(x) = |^0 (n) 

One could switch between the density of X and the density of Y only by 
changing x in 10 y or y in log x if one had taken for density of X the density 
of its law Pj with respect to the following Haar measurq^j on R+ : the image 
(also called push- forward) of Lebesgue on R by y h- > 1Q V . With respect to 
the Lebesgue measure this Haar measure has the density x i— > [ln(10) x] _1 . 
Obviously from (jlip . unimodality of g is equivalent to unimodality of 
/ := [i 4 x/(x)] and 

maxg(y) = ln(10) max[i/(x)] 

And as for the total variation of g, TV (g) = ln(10) TV(/). 

Here the "disaster" occurs: even if / is unimodal, g may be not, see 
Section I8.lt and even if max / tends to when a parameter converges to 
some value, the maximum ofxi->i f{ x ) may not tend to 0. Despite the fact 
that log-normal laws (see I8.3.ip and Pareto laws (see I8.3,2h do the work, the 
uniform law on [a,b] and the exponential law (see 18.2.21 and [872. 3D exemplify 
the difficulty. 

As allusively invoked above, classical usual laws described in textbooks 
are families depending on one or several parameters. The list is impressive, 
but the fact that two among the most simple ones fail in exemplifying the 
Benford phenomenon calls for questioning. Surely the so many random 
variables which seem obey to Benford do not follow a classical "usual law" 
and the sentence "if the spread of the r.v. is very large" (as in [Fell p. 63 
just after (8.6)] h is an unwise shortcut. For more comments see [Br[ IBHJ] 
and our Section [TT1 

14 The two ordered sets R and R+ are isomorphic. 

15 I am indebted to J. Saint-Pierre [SP| for this idea of Haar measure. Note that as 
early as 1970 Hamming [Haj used the measure with density l/(ln(10) a;) on the interval 
[10 _1 , 1]. See also Section!! 

16 In Feller the r.v. is denoted Y but it is the positive variable whose first digit is 
considered. 
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The comments in Terence Tao's blog |T1| (see also a book [T2j I did not 
see) are rather informal but their ideas maybe meet those of Hill |Hil| Th.3 
p. 361], [BH2l Section 6.2 specially Th.6.20]. 

As already noticed by many authors, mixing of several data ( [HiH IHi44 
IJRj and products |Boj can give good laws. 

I mention that Gauvrit and Delahaye [GD3, Th.2] say their more general 
result with a strictly increasing function in place of log applies to more 
situations. 

8 Examples 

8.1 Annoying examples 

One could expect that the hypothesis "the density g of Y = log X is uni- 
modal with a small maximum" is usually encountered. Expressed with X, 
it means that x t- y x f(x) is unimodal with a small maximum. This does not 
apply to the uniform law and to the exponential law: see below Section! 

Let us give small examples showing the action of multiplication by x. 

I) Let 



Mx) 



x ifxe]0, 1], 

x- 1 [l + (x-l)(x-2)/2] ifa:e[l,2], 
4x~ 3 if x € [2, +oo[ 



This is a positive integrable function, so it is, up to a multiplicative coeffi- 
cient, a density. It is decreasing on [1,2] because on this interval 

/o(-) = ^"^<0 

so /o is unimodal. But x h > x fo(%) is no longer unimodal. It has two 
maxima, at x = 1 and at x = 2. □ 

2) Let / defined on ]0,+oo[ by f(x) = on ]0, 1/2] U \J n >ii n ~ V 2 }, 
f(n) = 1/n 2 for all n > 1 and / affine on all intervals [n — 1/2, n] and all 
intervals [n, n + 1/2]. The graph of / consists of a serie of bumps in form of 
isosceles triangles. The total variation is finite with value 2^^? =1 1/n 2 . As 
for h(x) := x f(x) this function equals at each n — 1/2 and equals 1/n at 
each n. Since X^n=i V n = +°° the total variation of h is infinite. □ 
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8.2 Negative examples 

8.2.1 A kind of periodicity 

If the law of X is carried by the set 

(J[0.910 n ,10 n [ 

neZ 

then D = 9 (example inspired by |GD21 p.3]). And this in spite of, as soon 
as many intervals have > probabilities, a large "scattering". This can 
be realized with a C°° density taking strictly positive values on each open 
interval ]0.910 n ,10 n [. 

8.2.2 Uniform law 

The density is f(x) = j^ li ab -\(x) (with the parameters a and b satisfying 
< a < b). Surely x \-t x f(x) is unimodal on W* + and the spread of / is 
large when b — > +oo. The maximum of x f(x) is 1 if a = 0. And, if a > 
the maximum is t — which decreases when b increases to +oo, but the limit 
is 1. So an inequality as ([7]) or (fTUj) does not apply. For a finer study see 
[Br] , [BH2l Prop.4.15 p. 37]. 

8.2.3 Exponential law 

The density is f(x) = Xe~ Xx (A G ]0, +oo[ is the parameter). One could 
naively expect a good Bendford approximation when A — > 0. Derivating 
h(x) := x f(x) one proves easily that the function x f(x) is unimodal; and 
its maximum attained at x = 1/A has the value 1/e (particular case of (10) 
in |V2 | about Gamma law). This maximum does not tends to as A — >• 0. 
So an inequality as ([7]) does not apply, moreover 4 ln(10) e _1 = 3.388... is a 
very huge value. Here (flOl) gives 

< 2 ln(10)[maxx/(x)]/8 = ln(10) e _1 /4 = 0.211... 

x>0 



Engel and Leuenberger |ELj study the exact formula coming from @ 

+oo 

P(D = k)= Y, e- Afclon (l-e- A1 ° n ). 

n=— oo 

They prove that Bendorf is almost satisfied with a periodical dependance 
on log A and small gaps. But the error does not tend to as A — » 0. Note 

13 



that from |EL|, Fig.l p. 363], as functions of A the probabilities P(D = k) 
oscillates around the "Benford values" log((fc + l)/k) but they do not take 
the Benford value simultaneously. 

8.3 Positive examples 

8.3.1 Log- normal laws 

When X = exp(T ) with Y of law M(fi, a 2 ), one has Y = log X = Y / ln(10). 
Recall that the density of Yq is 

, t i ( (y-^Y 

U ^ 1= exp 



So the required properties of g are clearly verified. An inequality as (J7]) 
applies: when a tends to infinity the Benford approximation is good. 

8.3.2 Pareto laws 

The Pareto law of type 1 (cf. [GD2, p. 7]) depends on two parameters a and 
xq both in R?j_ and has the density 

f( X ) = ^1 1 [xo,oo[(x) • 

There x \-t xf(x) = — lu 0jOO r(ic) is non-decreasing on ]— oo,xo] (iden- 
tically null on ]— oo,xo[) an d non-increasing on [xo,+oo[. The maximum 
reached at a = xq equals in = af(a) = a. Thanks to an inequality as ([7]), 
when a tends to the probabilities P(_D = k) converge to the values of 
Benford ([1]). Note that X has no mean as soon as a < 1 which indicates a 
large "scattering". 

Pareto laws of type 2 are treated in [GD2] . 

9 Two exact results 

There exist in the litter ature a lot of exact results, some relying on "scale 
invariance" see |HiH IHi2l IHi3| . other relying on mixing of laws, see [JRJ. 
I will give personal results written when I was completely naive with the 
Benford phenomenon and being unaware of [Ha| . |BS| and [BH2j . 

The next Theorem shows that the rough hypothesis ".M(logA") follows 
the uniform law on [0, 1]" admits sufficient conditions. The first part has 
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already been given in 2010 by Block and Savits [BSj but there is already in 
Raimi |RH 1976] a similar result which would come from Benford: see page 
532 and the figure taken from Benford; see also [BH21 Ex. 3. 6 p. 24]. The 
second hypothesis comes from the caption of the figure in [GDI} IGD2] . 

Theorem 1 Let X be a random variable (X > 0) and Y := log(A). Sup- 
pose that Y has the density g. Suppose one of the following hypotheses: 

1) g is countably a step function, constant (equality Lebesgue a.e.) on 
each interval [n,n + 1] (n € "L), i.e. 

9 = ^7nl[n,n+l] , ( 12 ) 

2) g is continuous on R and affine on each interval [n, n + 1] (n G "L). 
Then X follows the Benford law ([6]) . 

Remarks. 1) formula (|12|) is equivalent to the Block and Savits expression 
(3)] (where 7„ is p n ): 



n=p 



ln(10)x 



where — oo < p < q < +oo, 7„ > and Ylln = 1- Such densities could 
approximate some reahlife densities, but a precise study is in the field of 
Numerical Analysis and Statistics (see a discussion in Part 2 of Section [TT|) . 
A particular case of (fl~3|) is the density 

j., s l[i0P,i09]( a; ) 

f( x ) ~ 



{q-p)ln(W)x 



where p < q, p,q £ Z which has been given in a footnote of Section [2j 

2) Without the continuity of g the second part does not hold: take 
g(y) = 2y on [0, 1] and elsewhere. 

Proof of Theorem d 1) Let j n be the value of g on [n, n + 1]. The serie ([5]) 
gives 

Y^ 9{n + y) = Y^ 7n = 1 

which proves that Jli(logX) follows the uniform law on [0, 1]. 
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2) The integral of g on [n, n+1] is the aera of a trapezoid and it amounts 
to 2 {gin) + g(n + 1)). As the sum is 1, it holds Xln^-oo5'( n ) = 1- Above 
[0, 1] the function 

N 

n=-N 

N N 

is affine and equals y, 9( n ) & t an d equals y, ff( n + 1) a t 1- All this 

n=-N n=-N 

converge to 1. For the incredulous reader if any: the affinity entails 

N N N 

J2 9{n + y) = y 2^2 9(n + l) + (l-y) J2 9{n) 

n=-N n=~N n=-N 

N 
= Y, 9(n) + y{g(N + l)-g(-N)) 

n=-N 
-»■ 1 

when A^ — > oo. D 

The next exact result is no longer realistic. It as already been obtained 
by Hamming [Ha, Section iv p. 1615] (quoted in |Rll p. 535]). See also [BH2, 
Part (i) of Th.4.13] and [BH2, Part 1 of Theorem 6.3]. One could imagine 
collecting data (richness, level of a river, etc.) in several places and several 
countries where the units are not the same. All this would be listed together. 

The idea leading to a mathematical result is: multiply a given r.v. Xq 
which models our physical quantity (at least in one precise unit) by a random 
coefficient belonging to [1,10], which gives X (and as for the law of X a 
mixing of the laws of the homothetic r.v. of Xq). Changing the unit of several 
times a factor 10 or 1/10 would not change the first digit in base 10. We 
assume that the coefficient obeys the Haar measurd_J of the multiplicative 
group (RHj.,*) restricted to [1,10] (more precisely the image of Lebesgue 
measure by u h-> 10"). 

Theorem 2 Let Xq be a random variable (Xq > 0) defined on (fi, J-, Pq). 
The Lebesgue measure on [0, 1] is denoted by A. Let X be the r.v. onQx [0, 1] 
equipped with the probability measure P := Po <S> A defined as 

X(u,u) = 10 u X (u) 

Then D obeys to the Benford law: for k € {1, ...,9} ; P(D = k) = log(^±I). 

17 See a foregoing footnote in Section [7] 
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Proof. Let Yq := log(Xo). For k G {1,...,9} one has 

D(u,u) = k^^X(uj,u) G \J [klO n ,(k + l)10 n [ 

neZ 

^^ u + Y (uj) G (J [n + log/c,n + log(/c + 1)[ . 

ngZ 

The above unions are disjoint, hence we have to sum the terms 

PN(w,«); u + Yq(uj) G [n + log/c,n + log(A; + l)[}) . 

We turn to calculus: The transformation of the second line relies on succes- 
sive integration (firstly with respect to u and then to y) 

p({(w,u); u + Y (lo) g [n + IogAj,n + log(* + l)[}) 

= (Py ®A)({(w,u); u + y £ [n + logk,n + log(k + 1)[M 

= / A([n + log k - y, n + log(fc + 1) - y[ n [0, 1]) dPy (y) 

= / A([log fc - y, Iog(fc + 1) - y] D [-n, -n + 1]) dPy (y) . 



But 



Y^ A(pog fc - y, log(fc + 1) - y] n [-n, -n + 1]) 



A([logfc-y,log(A; + l)-y]) 
log(fc + 1) — log k . 



As / dPy-Q (y) = 1 this proves 

Jr 

oo 

^ p({(lo,u); Y (u) G [n-u + log£;,n-u + log(£; + l)[}) 



loK(^) □ 



Comment. The hypothesis that the unit could be random and obey to 
a Haar measure is debatable. As said by some author, there is a ratio 10 
between the decimeter and the meter but as for volumes one gets the ratio 
of 10 between 100 dm 3 and one m 3 (and not between one dm 3 and one m 3 ). 
And usual units are certainly numerous but in a finite number: cf. meters 
and feet (argument of [SF] quoted above in Section [Q). 
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10 The stripey hat of Fewster and Poincare 



The following result below, convergence (|16p . is with a variant: n in place 
of A, in [BH2l Th.4.17 page 39]. The figure in Fewster [Pewl Fig.l p. 28] 
looks as Poincare's figure [Pol Fig. 15 p. 149]. In [Few] the grey parts are not 
half of length but have proportion z < 1. The idea is to give an intuitive 
justification of {g being the density of the r.v. Y) 



Jj^l [n , n+Z] ){y)\g(^)dy 



z as A — > +oo , 



Fewster curve is, as Poincare's one, "regular" . Moreover the one by Fewster 
is a bell (or hat) curve. 

The following result is classical for those knowing Young's measures and 
Rademacher functions (cf. [VI] ). 

Lemma 1 Let z G ]0, 1[ and ip = Yln& ^-[nn+z]- Then with the notation 

^\{y) ■= p(Ay), 

the function i()\ converges, when A — > +oo, ^"(L 00 ,^ 1 ) to z 1r that is 
V/iGL 1 ^), [ My)Hy)dy^z [ h(y)dy. 



Proof. The convergence when h is the characteristic function of a compact 
interval, h = lr a ,fei) is elementary. Then it holds for linear combinations of 
such functions, i.e. for functions h in a dense subset of L 1 . Since ip\ belongs 
to the unit ball of L°°, an equicontinuous subset of the dual space of L , the 
result holds for any h £ L . □ 

Let us denote for h G L \ ip G L°°, 

(ip,h) := / ip(y)h(y)dy. 

JR 

Now let (note that the action of A is not the same on <p and on g) 

«*<»> = 5«(£)- (") 

This is the density of AY (note that 10 Ay = X x ; and that this dilatation, 
with a in place of A, is already in [DL[ page 100]; see also Part (i) of [BH2, 
Th.6.1]). By change of variable 

(P,g\) = {4>\,9) (1-5) 
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1 [n,n+ z ])(y)^5(^)rfy = {<p,g\) = i^\,g) — > z g{v)dy = z 



By (|15p and Lemma 1 

1 (U 

(16) 
Hence a spreading of g on R in the manner of (|14p (i.e. a "dilatation" ) when 
A —> +oo implies the expected approximate Benford phenomenon. The 
spreading (|14p could be combined with a translation, but one is far from the 
bounds by Diimbgen et Leuenberger [DL]. 

The Poincare roulette cf. [Pol Section 92 pp. 148-150], |Ch] (and maybe 
[Fr] which is quoted by [Chj . but I did not see it) seems an easy result when 
one knows that the function 

J 1 if x € [k/2n, (k + l)/2n[ for k even, 
r n (x) = < 

I elsewhere 

converges a(L°°([0, 1]), L 1 ([0, 1])) to the constant function ~ l[o,il- The point 
is: let a circle be divided in 2n arcs of equal lengths, altenatively black and 
red. Then if g is a density of probability on the circle, the probability of 
black tends to 1/2 when n tends to infinity. Poincare assumed that the 
density is regular. 

11 Final comments 

1) A sufficient condition may be far from being necessary. Surely the reader 
could find himself many examples. I just propose two: 

a) a sufficient condition for a square matrix to have a nul determinant 
is "the first line is (0,...,0)"; 

b) a sufficient condition for x 2 to be < 4 (x belonging to R) is "—1/2 < 
x < 3/2"; or a worst one: "x = —1"; or the tautological "x E 0". 

If the proofs are sufficiently involved, if the mathematical objects belong to 
infinite dimensional spaces, it could be hard detecting the ridiculousness of 
the result. 

Is it true that all work done in the Pinkham- Feller line is of this kind? 

The hope of a good behavior — with respect to Benford — of a family 
of laws depending on one or several parameters (cf. the "usual laws" of 
textbooks) when a parameter converges to some limit is surely not the good 
idea. Maybe only some laws (e.g. the log-normal and Pareto's laws (type I)) 
perfectly realize this hope. For some laws (for numerous ones?) the gap is 
small (see |BH2l page 38]) but convergence to zero does not hold. 
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2) Note that as soon as a density expressed by (fl~3j) is close to the given 
density /, the Benford phenomenon will be approximately satisfied. How 
approach a given density / in this way? 

For example which coefficients 7„ make (|13p a valuable approximation 
o£l/(x) = Xe~ Xx ? Note that on the intervals [0.01,0.1], [0.1,1], [1,10], e _!B 
decreases respectively of a factor 1.09, 2.459, 8103.08, while 1/x decreases 
of a factor 10... A good adjustement seems difficult. 

In calculus of integrals, integration by the trapezoidal rule seems better 
than by the rectangle method. One could appoximate g by trapezoids: the 
second part of Theorem [1] would give a density on R satisfying Benford. But 
the constraint remains that the intervals are given: they are the [n, n + 1]. 

3) Limit laws obtained by involved processes could lead to Benford: this is 
suggested by Tao |T1| (see also a book |T2] I did not see) and the theorem 
by Hill [EH Th.3 p. 361], [BH2] Section 6.2 specially Th.6.20]. I reproduce 
an alinea of |BH21 p. 118] about the key hypothesis: "Justification of the 
hypothesis of scale- or base-unbiasedness of significant digits in practice is 
akin to justification of the hypothesis of independence (and identical dis- 
tribution) when applying the Strong Law of Large Numbers or the Central 
Limit Theorem to real- life processes: Neither hypothesis can be formally 
proved, yet in many real-life sampling procedures, they appear to be rea- 
sonable assumptions." 
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